180 research outputs found

    Discrimination of biofilm samples using pattern recognition techniques

    Get PDF
    Biofilms are complex aggregates formed by microorganisms such as bacteria, fungi and algae, which grow at the interfaces between water and natural or artificial materials. They are actively involved in processes of sorption and desorption of metal ions in water and reflect the environmental conditions in the recent past. Therefore, biofilms can be used as bioindicators of water quality. The goal of this study was to determine whether the biofilms, developed in different aquatic systems, could be successfully discriminated using data on their elemental compositions. Biofilms were grown on natural or polycarbonate materials in flowing water, standing water and seawater bodies. Using an unsupervised technique such as principal component analysis (PCA) and several supervised methods like classification and regression trees (CART), discriminant partial least squares regression (DPLS) and uninformative variable elimination–DPLS (UVE-DPLS), we could confirm the uniqueness of sea biofilms and make a distinction between flowing water and standing water biofilms. The CART, DPLS and UVE-DPLS discriminant models were validated with an independent test set selected either by the Kennard and Stone method or the duplex algorithm. The best model was obtained from CART with 100% correct classification rate for the test set designed by the Kennard and Stone algorithm. With CART, one variable describing the Mg content in the biofilm water phase was found to be important for the discrimination of flowing water and standing water biofilms

    Imaging cervical cytology with scanning near-field optical microscopy (SNOM) coupled with an IR-FEL

    Get PDF
    Cervical cancer remains a major cause of morbidity and mortality among women, especially in the developing world. Increased synthesis of proteins, lipids and nucleic acids is a pre-condition for the rapid proliferation of cancer cells. We show that scanning near-field optical microscopy, in combination with an infrared free electron laser (SNOM-IR-FEL), is able to distinguish between normal and squamous low-grade and high-grade dyskaryosis, and between normal and mixed squamous/glandular pre-invasive and adenocarcinoma cervical lesions, at designated wavelengths associated with DNA, Amide I/II and lipids. These findings evidence the promise of the SNOM-IR-FEL technique in obtaining chemical information relevant to the detection of cervical cell abnormalities and cancer diagnosis at spatial resolutions below the diffraction limit (?0.2 \ensuremathμm). We compare these results with analyses following attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy; although this latter approach has been demonstrated to detect underlying cervical atypia missed by conventional cytology, it is limited by a spatial resolution of ~3 \ensuremathμm to 30 \ensuremathμm due to the optical diffraction limit

    Linear, Deterministic, and Order-Invariant Initialization Methods for the K-Means Clustering Algorithm

    Full text link
    Over the past five decades, k-means has become the clustering algorithm of choice in many application domains primarily due to its simplicity, time/space efficiency, and invariance to the ordering of the data points. Unfortunately, the algorithm's sensitivity to the initial selection of the cluster centers remains to be its most serious drawback. Numerous initialization methods have been proposed to address this drawback. Many of these methods, however, have time complexity superlinear in the number of data points, which makes them impractical for large data sets. On the other hand, linear methods are often random and/or sensitive to the ordering of the data points. These methods are generally unreliable in that the quality of their results is unpredictable. Therefore, it is common practice to perform multiple runs of such methods and take the output of the run that produces the best results. Such a practice, however, greatly increases the computational requirements of the otherwise highly efficient k-means algorithm. In this chapter, we investigate the empirical performance of six linear, deterministic (non-random), and order-invariant k-means initialization methods on a large and diverse collection of data sets from the UCI Machine Learning Repository. The results demonstrate that two relatively unknown hierarchical initialization methods due to Su and Dy outperform the remaining four methods with respect to two objective effectiveness criteria. In addition, a recent method due to Erisoglu et al. performs surprisingly poorly.Comment: 21 pages, 2 figures, 5 tables, Partitional Clustering Algorithms (Springer, 2014). arXiv admin note: substantial text overlap with arXiv:1304.7465, arXiv:1209.196

    Automated systems to identify relevant documents in product risk management

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Product risk management involves critical assessment of the risks and benefits of health products circulating in the market. One of the important sources of safety information is the primary literature, especially for newer products which regulatory authorities have relatively little experience with. Although the primary literature provides vast and diverse information, only a small proportion of which is useful for product risk assessment work. Hence, the aim of this study is to explore the possibility of using text mining to automate the identification of useful articles, which will reduce the time taken for literature search and hence improving work efficiency. In this study, term-frequency inverse document-frequency values were computed for predictors extracted from the titles and abstracts of articles related to three tumour necrosis factors-alpha blockers. A general automated system was developed using only general predictors and was tested for its generalizability using articles related to four other drug classes. Several specific automated systems were developed using both general and specific predictors and training sets of different sizes in order to determine the minimum number of articles required for developing such systems.</p> <p>Results</p> <p>The general automated system had an area under the curve value of 0.731 and was able to rank 34.6% and 46.2% of the total number of 'useful' articles among the first 10% and 20% of the articles presented to the evaluators when tested on the generalizability set. However, its use may be limited by the subjective definition of useful articles. For the specific automated system, it was found that only 20 articles were required to develop a specific automated system with a prediction performance (AUC 0.748) that was better than that of general automated system.</p> <p>Conclusions</p> <p>Specific automated systems can be developed rapidly and avoid problems caused by subjective definition of useful articles. Thus the efficiency of product risk management can be improved with the use of specific automated systems.</p

    Metabolic profiling of human brain metastases using in vivo proton MR spectroscopy at 3T

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metastases to the central nervous system from different primary cancers are an oncologic challenge as the overall prognosis for these patients is generally poor. The incidence of brain metastases varies with type of primary cancer and is probably increasing due to improved therapies of extracranial metastases prolonging patient's overall survival and thereby time for brain metastases to develop. In addition, the greater access to improved neuroimaging techniques can provide earlier diagnosis. The aim of this study was to investigate the feasibility of using proton magnetic resonance spectroscopy (MRS) and multivariate analyses to characterize brain metastases originating from different primary cancers, to assess changes in spectra during radiation treatment and to correlate the spectra to clinical outcome after treatment.</p> <p>Methods</p> <p>Patients (n = 26) with brain metastases were examined using single voxel MRS at a 3T clinical MR system. Five patients were excluded due to poor spectral quality. The spectra were obtained before start (n = 21 patients), immediately after (n = 6 patients) and two months after end of treatment (n = 4 patients). Principal component analysis (PCA) and partial least square regression analysis (PLS) were applied in order to identify clustering of spectra due to origin of metastases and to relate clinical outcome (survival) of the patients to spectral data from the first MR examination.</p> <p>Results</p> <p>The PCA results indicated that brain metastases from primary lung and breast cancer were separated into two clusters, while the metastases from malignant melanomas showed no uniformity. The PLS analysis showed a significant correlation between MR spectral data and survival five months after MRS before start of treatment.</p> <p>Conclusion</p> <p>MRS determined metabolic profiles analysed by PCA and PLS might give valuable clinical information when planning and evaluating the treatment of brain metastases, and also when deciding to terminate further therapies.</p

    Mass spectrometry and multivariate analysis to classify cervical intraepithelial neoplasia from blood plasma: an untargeted lipidomic study

    Get PDF
    Cervical cancer is still an important issue of public health since it is the fourth most frequent type of cancer in women worldwide. Much effort has been dedicated to combating this cancer, in particular by the early detection of cervical pre-cancerous lesions. For this purpose, this paper reports the use of mass spectrometry coupled with multivariate analysis as an untargeted lipidomic approach to classifying 76 blood plasma samples into negative for intraepithelial lesion or malignancy (NILM, n = 42) and squamous intraepithelial lesion (SIL, n = 34). The crude lipid extract was directly analyzed with mass spectrometry for untargeted lipidomics, followed by multivariate analysis based on the principal component analysis (PCA) and genetic algorithm (GA) with support vector machines (SVM), linear (LDA) and quadratic (QDA) discriminant analysis. PCA-SVM models outperformed LDA and QDA results, achieving sensitivity and specificity values of 80.0% and 83.3%, respectively. Five types of lipids contributing to the distinction between NILM and SIL classes were identified, including prostaglandins, phospholipids, and sphingolipids for the former condition and Tetranor-PGFM and hydroperoxide lipid for the latter. These findings highlight the potentiality of using mass spectrometry associated with chemometrics to discriminate between healthy women and those suffering from cervical pre-cancerous lesions

    Some comments on the significance and development of midline behavior during infancy

    Full text link
    With the waning of the tonic neck reflex beginning with the 8th to 12th week, and disappearing, in most instances, by the 16th week, the infant begins to become bilateral and makes symmetrical movements and engages his hands in the midline usually over the chest while in a supine position. The developmental significance of such behavior is considered—for example, its participation in the emerging sense of self and its role in the consolidation of emerging ego skills. Consideration is given to the possible implications of faulty midline behavior for development, and to whether failure to engage in an optimal amount of midline behavior, in interaction with other factors, can be used to alert observers to possible future developmental disturbances.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/43965/1/10578_2005_Article_BF01435498.pd

    Evaluation of extra-virgin olive oils shelf life using an electronic tongue-chemometric approach

    Get PDF
    Physicochemical quality parameters, olfactory and gustatoryretronasal positive sensations of extra-virgin olive oils vary during storage leading to a decrease in the overall quality. Olive oil quality decline may prevent the compliance of olive oil quality with labeling and significantly reduce shelf life, resulting in important economic losses and negatively condition the consumer confidence. The feasibility of applying an electronic tongue to assess olive oils usual commercial light storage conditions and storage time was evaluated and compared with the discrimination potential of physicochemical or positive olfactory/gustatory sensorial parameters. Linear discriminant models, based on subsets of 58 electronic tongue sensor signals, selected by the meta-heuristic simulated annealing variable selection algorithm, allowed the correct classification of olive oils according to the light exposition conditions and/or storage time (sensitivities and specificities for leave-one-out cross-validation: 8296 %). The predictive performance of the E-tongue approach was further evaluated using an external independent dataset selected using the KennardStone algorithm and, in general, better classification rates (sensitivities and specificities for external dataset: 67100 %) were obtained compared to those achieved using physicochemical or sensorial data. So, the work carried out is a proof-of-principle that the proposed electrochemical device could be a practical and versatile tool for, in a single and fast electrochemical assay, successfully discriminate olive oils with different storage times and/or exposed to different light conditions.The authors acknowledge the financial support from the strategic funding of UID/BIO/04469/2013 unit, from Project POCI-01-0145-FEDER-006984—Associate Laboratory LSRELCM funded by FEDER funds through COMPETE2020—Programa Operacional Competitividade e Internacionalização (POCI)—and by national funds through FCT—Fundação para a Ciência e a Tecnologia and under the strategic funding of UID/BIO/04469/2013 unit. Nuno Rodrigues thanks FCT, POPH-QREN and FSE for the Ph.D. Grant (SFRH/BD/104038/2014).info:eu-repo/semantics/publishedVersio

    Incorporating Existing Network Information into Gene Network Inference

    Get PDF
    One methodology that has met success to infer gene networks from gene expression data is based upon ordinary differential equations (ODE). However new types of data continue to be produced, so it is worthwhile to investigate how to integrate these new data types into the inference procedure. One such data is physical interactions between transcription factors and the genes they regulate as measured by ChIP-chip or ChIP-seq experiments. These interactions can be incorporated into the gene network inference procedure as a priori network information. In this article, we extend the ODE methodology into a general optimization framework that incorporates existing network information in combination with regularization parameters that encourage network sparsity. We provide theoretical results proving convergence of the estimator for our method and show the corresponding probabilistic interpretation also converges. We demonstrate our method on simulated network data and show that existing network information improves performance, overcomes the lack of observations, and performs well even when some of the existing network information is incorrect. We further apply our method to the core regulatory network of embryonic stem cells utilizing predicted interactions from two studies as existing network information. We show that including the prior network information constructs a more closely representative regulatory network versus when no information is provided
    corecore